========================================================
Context: The H-1B is an employment-based, non-immigrant visa category for temporary foreign workers in the United States. For a foreign national to apply for H1-B visa, an US employer must offer a job and petition for H-1B visa with the US immigration department. This is the most common visa status applied for and held by international students once they complete college/ higher education (Masters, PhD) and work in a full-time position.
The columns in the dataset include:
CASE_STATUS: Status associated with the last significant event or decision. Valid values include “Certified,” “Certified-Withdrawn,” Denied,” and “Withdrawn”.
EMPLOYER_NAME: Name of employer submitting labor condition application.
SOC_NAME: Occupational name associated with the SOC_CODE. SOC_CODE is the occupational code associated with the job being requested for temporary labor condition, as classified by the Standard Occupational Classification (SOC) System.
JOB_TITLE: Title of the job
FULL_TIME_POSITION: Y = Full Time Position; N = Part Time Position
PREVAILING_WAGE: Prevailing Wage for the job being requested for temporary labor condition. The wage is listed at annual scale in USD. The prevailing wage for a job position is defined as the average wage paid to similarly employed workers in the requested occupation in the area of intended employment. The prevailing wage is based on the employer’s minimum requirements for the position.
YEAR: Year in which the H-1B visa petition was filed
WORKSITE: City and State information of the foreign worker’s intended area of employment
lon: longitude of the Worksite
lat: latitude of the Worksite
## Observations: 3,002,458
## Variables: 11
## $ X <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, ...
## $ CASE_STATUS <fctr> CERTIFIED-WITHDRAWN, CERTIFIED-WITHDRAWN, ...
## $ EMPLOYER_NAME <fctr> UNIVERSITY OF MICHIGAN, GOODMAN NETWORKS, ...
## $ SOC_NAME <fctr> BIOCHEMISTS AND BIOPHYSICISTS, CHIEF EXECU...
## $ JOB_TITLE <fctr> POSTDOCTORAL RESEARCH FELLOW, CHIEF OPERAT...
## $ FULL_TIME_POSITION <fctr> N, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, ...
## $ PREVAILING_WAGE <dbl> 36067.0, 242674.0, 193066.0, 220314.0, 1575...
## $ YEAR <int> 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2...
## $ WORKSITE <fctr> ANN ARBOR, MICHIGAN, PLANO, TEXAS, JERSEY ...
## $ lon <dbl> -83.74304, -96.69889, -74.07764, -104.99025...
## $ lat <dbl> 42.28083, 33.01984, 40.72816, 39.73924, 38....
The WORKSITE variable contains both city and state information, so it’s more convenient for me to do more granular analysis if I split WORKSITE into STATE and CITY.
## Observations: 3,002,458
## Variables: 12
## $ X <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, ...
## $ CASE_STATUS <fctr> CERTIFIED-WITHDRAWN, CERTIFIED-WITHDRAWN, ...
## $ EMPLOYER_NAME <fctr> UNIVERSITY OF MICHIGAN, GOODMAN NETWORKS, ...
## $ SOC_NAME <fctr> BIOCHEMISTS AND BIOPHYSICISTS, CHIEF EXECU...
## $ JOB_TITLE <fctr> POSTDOCTORAL RESEARCH FELLOW, CHIEF OPERAT...
## $ FULL_TIME_POSITION <fctr> N, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, ...
## $ PREVAILING_WAGE <dbl> 36067.0, 242674.0, 193066.0, 220314.0, 1575...
## $ YEAR <ord> 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2...
## $ CITY <chr> "ANN ARBOR", "PLANO", "JERSEY CITY", "DENVE...
## $ STATE <ord> MICHIGAN, TEXAS, NEW JERSEY, COLORADO, MISS...
## $ lon <dbl> -83.74304, -96.69889, -74.07764, -104.99025...
## $ lat <dbl> 42.28083, 33.01984, 40.72816, 39.73924, 38....
After plotting out the distribution of case status, we can easily see that certified cases dominate in this dataset. The bar chart above shows us the distribution of H1B visa status, a vast majority of the case status is “Certified” in this dataset. So my further analysis will only depend on CERTIFIED cases, which will provide more accurate insights to this scenario.
The bar chart tells us that Infosys Limited filed more than twice as many H1B visa applications as Tata Consultancy did within the six-year period. The top 3 companies in this chart are all India information technology companies.
The most popular occupations are computer system analysts, software developers, applications, computer programmers, computer occupations, all other, software developers, systems software, management analysts, financial analysts, accountants and auditors, mechanical engineers, network and computer systems administrators.
Obviously, tech occupations far outnumbered non-tech occupations. Among tech occupations, computer systems analyst is the most popular occupation for H1B applicants.
For more detailed information about each of these SOC names, please look them up via this link.
Next, let’s take a look at the percentages of full-time jobs and non full-time jobs. 85.8% of the total H1B visa applicants have full-time jobs, while the remaining 14.2% filed H1B visa petitions based on their part-time jobs.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10500 54770 65120 72550 81430 306000000
To investigate the prevailing_wage variable, histogram would be the best visualization to see the distribution. However, there are more than 3 million records in the dataset with a lot of extreme values. The alternative way to show the histogram of wage is to randomly sample about one tenth of the records and exclude the bottom 10 percent and top 5 percent data points from the sampled dataset.
Now I get the perfect histogram of prevailing wages. This right skewed distribution tells us that most foreign workers’ wages are between $60,000 and $65,000. The right tail of the distribution shows us that there are fewer foreign workers as the wages increase.
But the biggest flaw in this histogram is that I didn’t adjust the wage for inflation. This chart include all the data from 2011 to 2016.
California has the most H1B visa petitions during the past five years, followed by Texas and New York state. Later we will dive in deeper to find out the H1B visa petitions by cities.
There are 3,002,458 records in the dataset with 11 features (case_status, employer_name, soc_name, job_title, full_time_position, prevailing_wage, year, worksite, longitude, latitude).
There are some interesting observations derived from the univariate analysis:
The case status of the majority in this dataset is “Certified”.
Top 3 employers who filed the most H1B visa for their foreign employees are all Indian IT companies.
The median previaling wage for a H1B worker is $65,020 and the max price is $6,998,000,000. The interquantile range of prevailing wage is $27,061.
California, New York and Texas filed the most H1B visas from 2011 to 2016.
The main features of interest in the dataset are prevailing wage and quantity of H1B petitions. I would like to see how the prevailing wage varies accross different employers and occupations. I wonder whether some employers tend to give a more generous salary package to H1B workers than others, whether particular occupations receive a higher salary than others. I’m also curious about changes in wages and quantity of H1B petitions from 2011 to 2016.
SOC_NAMES, and EMPLOYERS. I would like to see how the prevailing wage varies accross different employers and occupations. I wonder whether some employers tend to give a more generous salary package to H1B workers than others, whether particular occupations receive a higher salary than others.
STATE, longitude and latitude are likely to contribute to the different wage levels and quantity of H1B petitions. My question is whether H1B workers living in areas with high living expenses tend to get higher wages correspondingly compared to those in other areas, and whether some economically developed areas are more willing to hire more foreign workers.
Yes, I split the WORKSITE variable into two new variables, CITY and STATE. With this data transformation, I can easily conduct more granular analysis based on locations.
Prevailing Wage:
When I investigated the distribution of prevailing wage, I find that there are many extreme values that distort the shape of the histogram. So I digged into the data deeper and noticed that the minimum wage is $0 and the maximum wage reaches 6 billion dollars! The anomalies in the dataset severely affect my analysis, so I decide to discard these extreme values and only focus on analyzing the middle range of the wage data.
Case Status:
At first glance, it’s hard for me to believe that there are so many H1B visa petitions during the past six years because a total of 85,000 cap subject H1B visas are available and can be issued each year. Of the 85,000 cap subject visas, 65,000 are available for the Regular Cap, while 20,000 are available for the ADE (Advanced Degree Exemption) Cap.
After an extensive research, I found the answer on the Kaggle discussion forum: the data contains New H1B petitions(before the lottery) + Extension Petitions + Positions exempt from H-1B visa cap ( PHD, Researchers ). For the CASE_STATUS, “CERTIFIED”" does not mean the applicant got his/her H1B visa approved, it just means that he/she is eligible to file an H1B.
| YEAR | CALIFORNIA | NEW YORK | TEXAS |
|---|---|---|---|
| 2011 | 56252 | 35244 | 26851 |
| 2012 | 64537 | 37086 | 31841 |
| 2013 | 72171 | 36460 | 36408 |
| 2014 | 85164 | 42169 | 45091 |
| 2015 | 100710 | 47703 | 55066 |
| 2016 | 104070 | 51293 | 59694 |
From the boxplot we can see that the median wage of Microsoft exceed all the other major sponsor companies given the wage range between $0 and $15,000. The interquantile range of prevailing wages of Tata Consultancy is the smallest compared to that of other companies, in other words, Tata Consultancy has the least variation in wages for the middle 50% of H1B workers.
| SOC_NAME | median_wage |
|---|---|
| PHYSICIANS AND SURGEONS, ALL OTHTER | 230605.0 |
| UROLOGISTS | 213158.0 |
| STRUCTURAL METAL FABRICATORS AND FITTERS | 204090.0 |
| SECURITIES, COMMODITIES, AND FINANCIAL SERVICES S | 201510.0 |
| FAMILY & GENERAL PRACTITIONERS | 194188.8 |
| PEDIATRICIANS | 189731.0 |
| INTERNISTS | 187200.0 |
| DENTIST | 182874.0 |
| PHYSICIAN AND SURGEONS, ALL OTHER | 178235.2 |
| PHYSICIANS AND SURGEONS | 174179.0 |
Because the dataset has a lot of outliers and is severely skewed, using the median wage as the metric to compare prevailing wages of different occupations will help reduce distortion and provide a better picture. Based on the median wage, the occupation having the highest median wage is PHYSICIANS AND SURGEONS. Out of the top 10 high income occupations, 8 are in the medical and health care field. I also find out that none of the top 10 high income occupations is included in the top 10 occupations with the most H1B petitions (computer system analysts, software developers, etc.).
In general, the prevailing wages of denied H1B cases have more extreme values than those of certified H1B cases. The 1st quartile, median and 3rd quartile of prevailing wages for certified cases are greater than those of denied cases respectively.
For each year, the bulk of H1B applicants have salaries between $50,000 and $70,000. The distribution of wage for each year is right skewed.
| YEAR | mean_wage | median_wage | 10th_percentile | 90th_percentile |
|---|---|---|---|---|
| 2011 | 65088.39 | 61173 | 40123.0 | 94952 |
| 2012 | 66658.97 | 62546 | 42099.0 | 96138 |
| 2013 | 68426.64 | 63898 | 44616.0 | 98342 |
| 2014 | 69405.36 | 64688 | 45406.0 | 98675 |
| 2015 | 70561.24 | 66019 | 47247.2 | 99939 |
| 2016 | 72778.78 | 68141 | 48318.0 | 104118 |
This plot depicts the mean, median, 10th percentile and 90th percentile of wages for all the H1B workers in each year. From 2011 to 2016, wages for H1B applicants are increasing gradually at a reltively low rate. The mean wages being larger than the median wages every year indicates that the wage distribution is right skewed and these outliers are so extreme that they drag the mean wages up.
Considering the payments in the five years period, average payments for foreign workers with certified H1B visa are higher than those with denied H1B. Some denied cases even have payments of $0.
Microsoft, one of the biggest tech companies, has the highest average wage for employees with certified H1B. Big consulting firms such as Accenture and Deloitte also tend to hire many foreign workers and offer them good salaries.
Despite the fact that certified H1B applicants in the medical science and health care field account for only a small fraction of the total H1B applicants, these people earn much more than people in other fields. Physicians and surgeons have an average annual salary above $200,000.
When investigating the trend of wages over the past five years, I found that the average and median wages for H1B workers are going up steadily.
California, New York and Texas are top 3 states that filed most H1B visa applications, so I decide to compare the H1B case quantity of each state per given year. From the bar plot above, it’s clear to see that California outnumbered the other two in each year. In 2011 and 2012, Texas has less certified H1B workers than New York state, but since 2013 Texas started to catch up and exceeded New York in 2014, 2015 and 2016. The number of certified H1B workers has been increasing over the past five years in all the three states.
Year and quantity of H1B petitions are positively correlated, the same situation applies to year and prevailing wage. The US economy continues growing steadily over the past six year, bringing more job opportunities to the US job market as well as widespread wage increase.
California, Texas and New York are top 3 states that has the most H1B applicants over the past six years. I plotted the locations using longtitude and latitude variables of each record so that it’s easier to see the distribution of H1B applicants on the map.
The dots cluster around San Francisco Bay Area, Los Angelous and San Diego in California. Metropolitans in Texas that have the most H1B applicants are Dallas-Fort Worth area and Houston. For New York state, NYC including Long Island is far ahead than any other cities.
Next, I plotted a stacked bar chart to investigate the trend of the proportion of H1B applicants who have computer or mathematical occupations from 2011 to 2016. The proportion of computer or mathematical occupations over the total occupations gradually increased over years. With the flourishes in Internet industry, the demand for talents with computer-related skills has been growing year by year.
There is an obvious increase in the wages for computer and mathematical occupations. The median wage jumped a lot from 2011 to 2012, after that we can see a steady increase.
While for other occupations, there is some fluctuations in the median wage and no apparent increase over years. Besides, variance in wages for other occupations across the six years is larger than those for computer and mathematical occupations.